Efficacy Challenges

Author! Author! Some Pointers When Submitting a Paper

Unfortunately, when invoking statistical significativity, some authors claim that a treatment has an effect, even against their evidence.

November 1, 2023

By: Paolo Giacomoni

Consultant

Cosmetic science is a field of research aimed at providing cosmetic benefits. Cosmetic scientists interact with academic scientists in a variety of fields, including photobiology, physical chemistry, dermatology, organic chemistry and cell biology. Research to learn about the formation or the removal of an age spot, for instance, is intellectually as “noble” as research to learn about intermolecular energy transfer… as long as the related experiments are designed and performed abiding to the same controlled, rigorous methodology.

Why ‘Peer-Reviewed?’

When developing a hair loss treatment, keep in mind that significant results don’t
always mean they’re relevant.

Why should cosmetic results be published in peer-reviewed scientific journals? Articles found in scientific journals have been reviewed by at least two scientists, expert in the field, who check that the methodology is described in sufficient detail, and that the results are correctly interpreted. What does it mean? Let us take the example of the beneficial effect obtained by treating the skin with a certain botanical extract. How can we check that the claim corresponds to reality? The only way to check is to repeat the experiments. Therefore, the experiment itself must be described in detail, in papers generally organized in Introduction, Material and Methods, Results, Discussion sections.

The goal of the Introduction is to present the question and provide the rationale for the experiments undertaken to answer it.

In the Material and Methods section, the authors should describe the experiments so that the reader can repeat the work. In this example, the preparation of the extract should be given in detail. There is a profound difference between writing:

the leaves of the plant “So-and-so” were extracted by incubating with a water/ethanol mix and the extract was dried by evaporation”

and writing:

To learn about possible seasonal variations, the leaves of the plant “So-and-so” were harvested from a single tree in March, June and September, and were separately incubated in a 50% ethanol/water solution (100 grams of leaves per one liter solution) at 25 degrees Celsius for 6 hours and dried on a glass rotary evaporator (name of the Company, City and State) to yield extracts E-March, E-June, and E-September.

There is a profound difference between writing:

the dry extract was suspended in the hydroalcoholic mix and served as base to prepare the cream that was used in the clinical test

and writing:

20 grams of each one of the the dry extracts were dissolved in 100 ml of 50% Water/ethanol to form three Solutions E (E March, E June and E September). The solutions were characterized by spectrophotometry (the Absorption spectra are reported in Figure 1) and analyzed by Gas Chromatograpy and Mass Spectroscopy for specific components (results reported in Table 1). The clinical experiment was performed using a Simplex formula (composition in Table 2) with no addition or containing 3% or 10% Solution E March or of Solution E June, or of Solution E September, for a total of seven formulas.

Similarly detailed descriptions should be provided for the experimental protocol, the inclusion/exclusion criteria for the selection of the volunteers and their number, and the composition of the expert panel responsible for the evaluation of the results.

Often, in clinical experiments, the Results are expressed in qualitative terms (scores). This means that the expert panel evaluates the characteristics of the skin (e.g., roughness, luminosity, etc.) and attributes a score (e.g., for skin roughness the scores could be: 0 = smooth, 1 = mildly rough, 2 = moderately rough, 3 =severely rough, 4 = severely rough and dry, 5 = rough, dry and squamous). A topical product is considered successful against rough skin when it provokes a decline in roughness score. The results should be displayed as distribution histograms (number of people having a certain score versus the score) before and after the treatment. The comparison of the histograms should allow one to visualize the shift of the histogram towardlower score values.

Sometimes, perhaps to save time or page space, some authors summarize the results reporting the average of the score before and after the treatment.Unfortunately, averaging the scores is a meaningless, illegitimate procedure!

Let’s review why.

As a paradoxical example, consider a treatment that changes the color of black hair. A reasonable scoring for color change could be blue-violet (score 1), green (score 2), yellow (score 3), orange (score 4) and red (score 5). Now, let’s imagine that that treatment changes the color of black hair to yellow in 50% of the cases, and to red in the other 50% of the cases. The “average” between yellow (score 3) and red (score 5) is 4 (corresponding to orange). Should we conclude that the treatment makes the hair orange? Wouldn’t it be more appropriate to ask “what is the biochemical difference between the hair that turn red and those who turn yellow?

In the Discussion section, the authors justify the experiments undertaken and provide the limits of their results. They can state for instance, that the experiment lasted six weeks and they did not control the nutritional behavior of the volunteers, so that the results could be the consequence of a nutritional supplement and not of the topical cream. The authors can also discuss the statistical treatment of the results, when they are expressed with a quantitative number, not with a score, and point out that the difference:

[After the treatment] minus [Before the treatment]

is larger than two standard deviations and therefore highly relevant. They can also use sophisticated statistics and conclude about the statistical significance of the results.

A major problem arises when the authors claim that the results are statistically significative and omit to display the raw data. What does it mean? Does statistical “significativity” imply relevance? Let’s take a paradoxical example: Treating the scalp of bald volunteers with a lotion, provokes the growth of exactly two hair per square centimeter. The results are statistically significative: p<0.000001. Are they relevant? Would we pay good money to buy that lotion?

Conclusion

When a manuscript is submitted for publication in a scientific journal, two or more experts in the field will review it. The reviewers can recommend it for publication, rejection or revision. The reviewers are the best friends of the authors and of the manuscript. Their comments are meant to improve the manuscript and to help the authors more deeply understand the problem. Sometimes, when invoking statistical significativity, some authors claim that a treatment has an effect, even against their evidence. This is wrong. They should keep in mind that the opposite of a great truth is also a great truth, and that as much as it would be a great truth that the extract of “so-and-so” eliminates roughness, it would also be a great truth if the extract of “so-and-so” does not have any beneficial effect!

Paolo Giacomoni, PhD
Insight Analysis Consulting
[email protected]
516-769-6904

Paolo Giacomoni acts as an independent consultant to the skin care industry. He served as Executive Director of Research at Estée Lauder and was Head of the Department of Biology with L’Oréal. He has built a record of achievements through research on DNA damage and metabolic impairment induced by UV radiation as well as on the positive effects of vitamins and antioxidants. He has authored more than 100 peer-reviewed publications and has more than 20 patents. He is presently Head of R&D with L.RAPHAEL—The science of beauty—Geneva, Switzerland .